A principled framework to assess the information-theoretic fitness of brain functional sub-circuits

In systems and network neuroscience, many common practices in brain connectomic analysis are often not properly scrutinized. One such practice is mapping a predetermined set of sub-circuits, like functional networks (FNs), onto subjects’ functional connectomes (FCs) without adequately assessing the information-theoretic appropriateness of the partition. Another practice that goes unchallenged is thresholding weighted FCs to remove spurious connections without justifying the chosen threshold. This paper leverages recent theoretical advances in Stochastic Block Models (SBMs) to formally define and quantify the information-theoretic fitness (e.g., prominence) of a predetermined set of FNs when mapped to individual FCs under different fMRI task conditions. Our framework allows for evaluating any combination of FC granularity, FN partition, and thresholding strategy, thereby optimizing these choices to preserve important topological features of the human brain connectomes. By applying to the Human Connectome Project with Schaefer parcellations at multiple levels of granularity, the framework showed that the common thresholding value of 0.25 was indeed information-theoretically valid for group-average FCs despite its previous lack of justification. Our results pave the way for the proper use of FNs and thresholding methods and provide insights for future research in individualized parcellations.


Introduction
The success of large-scale brain connectomics-which subserves a myriad of neuroimaging research endeavors based on fMRI [41,53,62], MEG [19], and EEG [54]hinges on choosing representations of functional connectivity that are as well-defined as possible.Functional connectomes (FCs) are often constructed by computing a statistical dependency measure, such as the Pearson correlation coefficient, across all specified pairs of the brain's regions of interest (ROIs) using the aggregated voxel level blood-oxygenlevel-dependent (BOLD) signals.However, constructing FCs from BOLD signals with activation delays (due to inhibitory-excitatory dynamics possibly causing negative ROI correlations) can significantly impact estimates of population-level FCs [53] and the associated functional brain network topological features such as nodes' centrality [5], global network measures [41], and geometry-topology relation [47].Recent efforts have focused on improving FC construction by taking into account neuronal signal activation delays [41] and negative correlations [62].Nonetheless, much effort is still needed to quantify the efficacy of each FC construction framework, especially in terms of preserving the "true" features of the population FCs that shed light on fundamental principles of the brain.
Functional sub-circuits, e.g., functional networks (FNs) [60], and their modularity characteristics [13,21,40,52] are crucial to understanding such fundamental neural principles, including brain complexity [12], differential configurational properties [21], modular structures [40,51], and information processing [7,9].Studies on the modular organizations of the human brain have also informed applied research on aging [14,39] and disorders including schizophrenia [6].Moreover, research consistently shows that executive subsystems in the brain are reproducible across many individuals at rest, e.g., [45,60], indicating a widespread application of these FNs in various studies-from control groups [38] to pathological investigations [17] and predicting individual differences [46].Even so, there have been few (if any) systematic studies addressing the validity of a common and rarely challenged practice in brain connectomics, which is applying one specific set of a priori FNs to multiple FCs.In other words, FC processing usually involves mapping an a priori fixed set of FNs onto the constructed FCs, across different subjects and fMRI task conditions, without examining whether those mappings are relevant information-theoretically fit to the constructed FCs, given the existence of human brain fingerprint [1,1,3,7,8,11,18,21,24,29,30].
Among the many decisions influencing whole-brain functional connectivity estimates like FCs and circuit-level representations like FNs, the choice of brain parcellations, i.e., how nodes in functional brain networks are defined, is undoubtedly one of the most critical steps.[31,48,49].In fact, this choice determines the network topology used in downstream analyses.Recent studies have shown that different levels of parcellation granularity can affect the identification of subject-level FC fingerprints [2,24].In an effort to register the raw neuroimaging data into a sequence of increasing granularity, Schaefer and colleagues have recently published a scheme of atlases that increase in network sizes.These parcellations refine the robust set of resting state networks initially identified by Yeo et al. [60], offering various granularity levels for in-depth analysis.Thanks to these advancements, the brain connectivity research community can now explore characteristics of sequential functional brain networks, especially those coupled with the corresponding a priori set of FNs.
Regardless of which parcellation scheme is employed during large-scale FC and FN analyses, another common practice in network neuroscience is thresholding (or, more generally, eliminating statistically spurious functional edges) based on some arbitrary rules or research hypotheses.Careful design of the thresholding process is central to ensuring scientific rigor not only in healthy control studies but also in those studying disorders such as schizophrenia [59], unipolar depression, and bipolar disorder [61].Unrigorous application of thresholding can therefore undermine the validity of such important studies by affecting downstream analyses, including parametric statistical tests [35] and network characterization [56].To mitigate such issues, various thresholding strategies have been proposed to retain particular desired attributes of the original weighted networks.These strategies include proportional thresholding aimed at keeping the absolute number of edges across different subjects and tasks [56], modular similarity [61], and percolation aimed at preserving the topological features of the original weighted graph [22].Spurious edge elimination also involves methods based on wavelets [59], mixture modeling [16], topological data analysis through persistent homology [36,37], branch-and-bound based algorithms (to study cognitive activity [54]), and orthogonal minimal spanning trees for dynamical functional brain networks [20].Furthermore, alternatives to thresholding treatment for FCs have also been proposed using hierarchical Bayesian mixture models.[33].However, this multitude of strategies further complicates the already complex decision-making process of brain data preprocessing and analysis.After all, how can one determine which combination of FC parcellation, FN partitioning, and edge pruning techniques is optimal for their dataset?To the best of our knowledge, no studies have offered a mathematically justified and robust process for choosing such combinations.
This work tackles the complexity posed by that abundance of choices and lack of theoretical justification.Our objectives are two-fold: i) formalizing and quantify the level of information prominence of a given fixed set of FNs across different subjects and tasks, and ii) using the level of prominence as guidance to eliminate spurious functional edges in whole-brain FCs.To do so, we utilize Schaefer parcellations [49] with nine distinct granularity levels, ranging from 100 to 900 nodes in 100-node increments.We first present an overview of Schaefer parcellations, as well as a formalization of stochastic block models (SBMs) and its relevance to our quest Section 2. We then propose an SBM reconstruction pipeline in Section 3. We wrap up with Results (Section 4) and Discussion (Section 5).Our framework can be generalized to any given pair of an FN partition and a parcellation (e.g.[32,55]).

2.
A principled framework to assess information theoretical fitness of brain functional sub-circuits/networks

Schaefer's atlas of cerebral cortex
The brain parcellations used in this work are sequential, in the sense that their granularity increases, ranging from 100 nodes to 900 nodes with increments of 100 nodes, and are registered on the cortical surface of the brain.These sequential atlases are made possible thanks to the work of Schaefer and colleagues [49].To ensure our framework is consistent with prior works in human brain connectomics [10,11], we added 14 sub-cortical regions, resulting in network sizes of 114, 214, ..., 914 nodes for all fMRI conditions in the Human Connectome Project (HCP) repository, Released Q3 [57,58].Schaefer parcellations are subdivisions of Yeo's functional networks [60], in such a way that any coarser-grained partition is a subdivision of or related to a finer-grained Schaefer one.For instance, let u 114 be a node in the Schaefer graph with n = 114 nodes.Every node is further subdivided into two nodes v ′ 214 and v ′′ 214 in the next Schaefer graph in the sequence, i.e., the one with n = 214 nodes.
In practice, the subsequent divisions from coarser to finer granularity of Schaefer parcellations are not perfectly hierarchical, in the sense that one brain region in the coarser parcellation is not perfectly parcellated into subsequently smaller regions the finer one.Nonetheless, for relaxation purposes, if a node associated with the coarser parcellation has significant spatial overlaps with those in subsequently finer parcellations of the Schaefer graph sequence, they are assigned to the same resting state network.

Stochastic Block Models (SBMs)
Stochastic Block Models (SBMs) have recently gained traction due to exciting developments in both theoretical and practical domains (see Preliminaries-Stochastic Block Models in Supplementary Information for further details on notations and a brief introduction).Theory-wise, phase transitions in the fundamental limits of community detection (or more generally, mesoscopic structures) were discovered through the measure Signal-to-Noise Ratio (SNR) [4].In the domain of brain connectivity, SBM has demonstrated its advantages in exploring and uncovering diverse types of brain functional sub-circuits (e.g., dis-assortative or core-periphery) beyond the traditional assortative mesoscopic structures [15,23].Specifically, Sandon and Abbe, in [4], laid out a comprehensive treatment of criteria for mesoscopic structure recovery for any pair of a networked system and an a priori set of communities (or functional networks in brain connectomic domain).Specifically, the recovery requirements were classified under: (i) Weak Recovery (also known as community detection); (ii) Almost Exact Recovery; (iii) Exact Recovery.
The recoverability of the ground-truth partition depends on the degree regime (indicated by the degree scaling factor s t ) in which the network resides.For instance, weak recovery only requires the necessary condition for a limiting graph (n → ∞) to be in the constant degree regime, i.e., O( 1n ).On the other hand, exact recovery requires the necessary condition (for limiting graph) that the graph is asymptotically connected, i.e., in the degree regime of logarithmic O( log(n) n ).The sufficient condition for all the recovery criteria is stated in the respective theorems with different proposed measures with sharp phase transitions, as seen in [4].‡ Further details on recovery theorems are located in Supplementary Information.
Here, we chose the weak-recovery requirements as guidance for whole-brain functional connectivity estimation for four reasons: (i) Although Schaefer parcellations with an increasing number of nodes allow us to project some empirical insights onto their degree regime, a rigorous theoretical argument on the degree regime is not possible for any empirical graph sequence.Hence, exact recovery of an a priori unique ground-truth partition is not relevant in the case of brain functional connectomes; (ii) Even in the empirical domain, we observe that both group-average and individual FCs become disconnected (i.e., the number of connected components is more than 1) after a relatively small threshold value in the interval τ ∈ [0.2, 0.3].Theoretically, a graph sequence is required to be connected, asymptotically, to fulfill the requirements for exact recovery.On the other hand, weak-recovery (detection of mesoscopic structures) offers a more realistic and relaxed set of criteria for this particular application.This facilitates estimating a whole-brain FC that is most suitable for an a priori set of FNs without evaluating the number of connected components of the thresholded FC.
(iii) Most (if not all) mesoscopic studies of brain functional sub-circuits such as [15,23] are based on pre-defined hypotheses, e.g., that the brain functional subcircuits involve a more diverse class of community than just assortative ones [15].Such assumption leads to the appropriate usage of different community detection algorithms such as Weighted Stochastic Block Models (WSBM) in the case of [15,23].As mentioned above, weak-recovery is equivalent to community detection in the theoretical SBM literature; (iv) No set of functional sub-circuits is universally agreed and uniquely identified as the ground-truth communities.Hence, all proposed brain functional sub-circuit parcellations,e.g., [60], are relative.‡ If a measure (say for weak or exact recovery) is below a certain algebraic threshold (stated in the respective theorems), recovery is not possible although the necessary condition is satisfied.• {G t } , ∀t ∈ N is the graph sequence; in the empirical domain, the number of graphs in the sequence is defined as • k is the number of communities/clusters; In general, σ is also referred to as a graph partition; is the vector containing cardinality of community where • C is the statistical summary of edge properties within and between communities in matrix form.Mathematically, where C bin ∈ N k×k + denoted the simple edge count matrix within or between communities and C wei denoted the weighted edge sum (also within or between communities); is the maximum number of edges within or between communities; is a k × k matrix filled with p i in the diagonal; is the expected node degree matrix, i.e. the expected number of connections a node in community i has with community j; • s t : scalable factor of degree regime in a graph sequence G t where t ∈ T ; • W st = [w ij ] st is the edge probability between 2 nodes in community i and j in terms of the scaling factor s t .§ We use W to denote the edge probability matrix with s t = 1; § It is worth noting that if w ij is the same for all i, j ∈ [k], then SBM collapses to classical ER random graph model is the community profile matrix where i column is the expected number of edges that community i has with all communities.Note that for weak-recovery (detection), scaling factor s t = 1.

Inference and extended usage
The basis of SBM parameter inference is reverse engineering by the maximum likelihood principle.Specifically, since both G and σ (subsequently, k = max u∈[n] σ u ) are priors, in expectation, we can infer SBM (P, W ) using the Bayesian approach as follows: where C bin is a simple edge count of M GA τ between or within blocks of communities whereas C wei is the sum of weighted edges of F C τ (also between or within communities).Specifically, The inference of matrix P is based on the law of large numbers [4].For W bin , we perform entry-wise divisions of matrix C bin by matrix C max , which infers the Bernoulli random variable parameter p representing the probability of successful edge formation between each pair of stochastically equivalent nodes within or between communities.In the case of C wei , note that we use the term computing instead of inferring because we have extended the usage of SN R to mesoscopic prominence measure.We use the absolute values ∥w uv ∥ to only consider the overall magnitude (and not the sign) of functional couplings within/between FNs.Technically, this inference is less challenging than traditional inference problems where σ is also a latent variable in the model and graph ensemble G is the only observable ensemble available.Specifically, (G, n, σ, k) ∼ SBM (P, W ) where G and σ are priors.

Weak Recovery of ground-truth partition
Definition 1. Weak recovery of a ground-truth partition can be rigorously equivalent to the existence of an algorithm that infers a partition that agrees with the ground-truth one up to max i p i + ϵ, ∀i ∈ [k].This level of accuracy is the minimal requirement for most community detection methods.
Theorem 1. (Sandon and Abbe [4]) Let (G, σ) ∼ SBM n, p, stQ n for p, Q arbitrary and s t = 1.If SN R > 1, then weak recovery is efficiently solvable; where and λ i is the i th eigen value of the community profile matrix P Q.
Weak recovery of given ground-truth communities means that through that algorithm, the recovered partition outperforms a random guess, i.e. max i p i , by a small factor ϵ. The criteria for weak recovery are driven by a hard threshold approach presented in the below theorem.Importantly, achieving weak recovery does not necessitate the graph being connected under an asymptotic regime.Loosely speaking, we only need every graph in the graph sequence to have a large connected component.In other words, we only need {G t∈T } to be in the constant degree regime, i.e. s t = 1.Consider a network of n nodes divided into two equal-sized ground-truth communities (i.e.n 2 nodes for each community).In a weak recovery scenario, it is feasible to accurately identify the community membership of each node with a probability marginally above 50%, say by an additional 5%.It implies that if an ensemble is generated under a constant degree regime, one can arbitrarily assign any community membership to isolated nodes, i.e. leaves; hence, exact recovery is impossible in this regime.On the other hand, for exact recovery, since W scales with n through the factor s t , the community profile matrix M consequently grows with the factor s t as well.
2.5.Problem Formulation and the fitness assessment framework 2.5.1.Problem formulation: Problem Statement: Given an a priori set of functional sub-circuits, we would like to investigate and subsequently quantify the "fitness" of such pre-determined partition for a given brain network topology.Problem Formulation: We formulate this problem as the "dual" of the primal community detection problems.[25,26,27,28].Specifically, in the primal problem, a partition (i.e., set of mesoscopic structures) is constructed for a given network topology.In the dual, we look to evaluate and subsequently quantify the fitness of a pre-determined (e.g., a priori ) set of mesoscopic structures for the given network.

Fitness Assessment Framework:
For a given pair of a complex network (e.g., functional connectome) and an a priori set of ground-truth communities (e.g., Yeo's functional sub-circuits), we propose the following steps to access the informationtheoretic fitness of ground-truth communities (Figure 1) as follows: 1: procedure reconFC(Γ, n, k, σ, S, P , W FC,M ) 2: for every Schaefer Granularity level and fMRI task do 3: Step 1: Compute the group-average FC (F C GA ) 4: Using all individual FCs (F C γ ) given a Schaefer parcellation and fMRI task, Step 2: Vetting 6: for ⃗ τ ∈ S do

7:
Compute the masked, binarized group-average FC: Infer SBM parameters and apply Theorem 1 end for 10: Determine the weak-recoverability sub-interval

16:
end for 17: end for 18: Step 4: Obtain optimal thresholding parameters and check their weakrecoverability 19: 22: end for 23: end for 24: end procedure Figure 1: Pseudo-code for reconFC routine using the number of individual FCs Γ, Schaefer granularity n, number of functional networks k, a priori partition σ, parameter space S, community assignment likelihood P , and connectivity pattern matrix W .Note that S could contain dim(S) parameters in general.In addition, in line 14, the notation G ≻ ⃗ τ ( * ) represents a particular configuration of matrix G that satisfies the parameter space S represented by the value of ⃗ τ ( * ).
(i) Step 1: Obtain an average representation (e.g., a group-average FC) from the collection of individual networks, binarize the group-average representation, apply Theorem 1 across the thresholding parameter space S, and yield the masked, binarized group-average FC; (ii) Step 2 (Vetting Step): Compute the SNR for the masked group-average FC and investigate the SNR across all finite combinations in S. Compute the the weakrecoverability sub-interval I ⊆ S; (iii) Step 3: Compute the a priori community prominence for each individual FC; note that this prominence can also be computed for the group-average FC.
(iv) Step 4: For each individual FC, obtain the ⃗ τ maximizing the prominence computed in Step 3. Check if τ opt belongs to the weak-recoverability sub-interval I.Note that similar to step 3, this step can also be performed on the group-average FC.

Application: A pipeline for thresholding functional connectomes
The below pipeline describes the process to compute the optimal threshold for a given fMRI condition, a Schaefer granularity, and a cohort in two particular cases: • individually driven threshold τ γ opt ; • constant (cohort-driven) threshold τ GA opt where GA stands for group-average Here, we see that the parameter space S reduces to the line search of threshold value τ = [0, 1] ∈ S. The pipeline contains four distinct steps: (i) Step 1: For each Schaefer granularity level and task, compute the binarized (masked) group-average FC (denoted as M GA τ ) using the entry-average of individual FCs (the number of individual FC is denoted as Γ) Analogously, we can also compute the SN R for the group-average FC (F C GA ) as follows: Infer matrix W bin of group-average FC First For Loop to compute Weak-Recoverability sub-Interval Second For Loop to compute individual subject's SNR across threshold values Third For loop to compute τ opt for all subjects, for a given fixed fMRI condition.
For Loop ∀τ ∈ [0, 1] : Example: τ = 0.30 Example: γ = 1 and τ = 0.30 Step 1 Step 2 Step 3 Step 4 Step 5 Notes: For a given, fixed fMRI condition (such as Resting) (*) Note that the thresholding step can also be applied for group-average FC (F C GA ).Note that the for-loop indicated by ( * ) is used to find the individualized optimal threshold for each subject, τ γ opt .One can substitute this for-loop by finding the unique cohort optimal threshold, τ GA opt , using the groupaverage FC.
Repeat step 3 for all threshold values τ ∈ [0, 1] and all individual FCs for a given fixed Schaefer parcellation and fMRI task pair; (iv) Step 4: (a) Obtain the threshold value that maximizes SNR of the thresholded FC and the corresponding optimally reconstructed whole-brain FC; Note that if the group-average FC (F C GA ) is used in Step 3 then: Note that one needs to check the optimal threshold against the weak-recovery subinterval, regardless of whether it is an individualized threshold (τ γ opt ) or a groupaverage one (τ GA opt ).

Results
In this section, using weak recovery criteria, we investigate the level of informationtheoretic prominence of an a priori set of FNs with respect to different FCs (both groupaverage and individual subject levels) across a range of threshold values.Additionally, we offer deeper insights into the use of SNR as a measure of the information-theoretic prominence of this predetermined set of FNs.
The dataset used in this paper contains 410 unrelated subjects from the Human Connectome Project repository, Released Q3 [57,58].This includes (test and retest) sessions for resting state and seven fMRI tasks: gambling (GAM), relational (REL), social (SOC), working memory (WM), language processing (LANG), emotion (EMOT), and motor (MOT).Whole-brain FCs estimated from this fMRI dataset include 9 distinct Schaefer granularity levels that parcellate the cortical regions into n = 100 to n = 900 nodes, with a 100 nodes increment for each parcellation.The functional communities evaluated in this framework include seven cortical resting state FNs from [60]: visual (VIS), somatomotor (SM), dorsal attention (DA), ventral attention (VA), frontoparietal (FP), limbic (LIM), default mode (DMN).Each Schaefer granularity has a corresponding Yeo's FN parcellation.Additional details about the dataset are available in Supplementary Information.weak-recovery stay fairly stable: τ ∈ [0.05, 0.8].The lower bound a w stabilizes faster than the upper bound b w , across Schaefer parcellations.Except for the low-resolution parcellation n = 100, the weak-recovery valid range is relatively stable and parcellationindependent.This implies that the information-theoretic relevance of an a priori set of FNs is, to some extent, parcellation-free.In other words, for all investigated granularity levels, the thresholded graphs are in the weak-recoverability regime, except for the complete ( τ ∈ [0, 0.05)) or empty (τ ∈ (0.8, 1]) graph extremes.Panel D of figure 3 shows further details on the FC density.This is rather interesting because, at those two extremes, networks will contain either too much noise (complete graphs) or too little signal (empty graph) for any highly putative partitions to be information-theoretically relevant.

Resting State: Group-Average versus Individuals
Based on panels (B) and (E) of figure 3, it is evident that all SNR profiles (including the group average and individual levels) behave non-monotonically across the threshold range.There exists a threshold value such that SNR is maximized in the investigated range τ ∈ [0, 1].In addition, all optimal threshold values, for both group-average and individual FCs, are within the weak-recoverability sub-interval (a w , b w ) for all investigated Schaefer granularity levels.Secondly, we see that both group-average and individual SNR profiles scale with n.This is because the scaling factor s t for the Schaefer FC sequence is not constant.In other words, as the graph size gets larger, one can expect the community profile matrix to represent the number of expected "friends" between FN i and j (e.g. between DMN and LIM) to get larger numerically.Further evidence on empirical exploration of the Schaefer graph sequence degree regime is located in Supplementary Information.
Thirdly, for a fixed Schaefer granularity level, the group-average SNR peaks higher and earlier across the investigated threshold range than that of an individual subject.Interestingly, the topological property of connected components for both individual and group-average FCs, across all Schaefer parcellations, also exhibit a similar trend.Specifically, according to Figure S1 (Supplementary information), individual FCs get fragmented earlier, i.e., the number of connected components surpasses 1 faster, compared to the corresponding group-average FCs for a fixed granularity level.Topologically and numerically speaking, averaging FC entries across the subject domain damps down the individual fingerprints presented as high-magnitude Pearson correlation values in FCs.This results in magnitude-wise smaller functional connectivity entries, which get annihilated by smaller threshold value τ .On the other hand, using the same analogy, one can see that it takes a higher threshold value for individual FC entries to be annihilated.

Individualized optimal thresholds
As one can observe from figure 5, the individualized optimal threshold varies across different individuals, which demonstrates strong evidence of the existence of FN functional fingerprint across subjects.In addition, the average of these individualized thresholds, for a given parcellation granularity, is roughly equal to the group-average optimal threshold.

Group-average: Resting State vs. fMRI Task Analysis
Next, we investigate the prominence of Yeo's resting state networks with respect to different fMRI conditions, including 7 tasks and the resting state, through SNR measures using group-average FCs across all Schaefer granularity levels and the entire threshold range.Using the resting state SNR profile as a baseline, we compare all task responses in these two scenarios: • constructing FCs with the maximum number of time points available for each fMRI condition; Figure 5: Individualized optimal threshold is derived using the SNR behavior of each individual for 4 distinct Schaefer's parcellations n = {100, 300, 600, 900}.
• for all fMRI conditions, constructing FCs using 166 time points, which correspond to the number of time points associated with the EMOTION task that is also the minimum across all conditions.
Firstly, in both scenarios, the maximum SNR values for all examined tasks surpass the hard threshold SN R = 1 for weak recoverability.Moreover, the optimal threshold τ opt consistently falls within the range (a w , b w ).Trivially, resting state SNR dominates all available tasks across all parcellation levels.This is expected because the selected set of FNs are Yeo's resting state networks.Secondly, working memory (WM) fMRI responds fairly consistently across all granularity levels in both scenarios.From an information-theoretic perspective, EMOTION is the most similar task to the resting state, with respect to Yeo's resting state networks.
Thirdly, in the maximum-timepoint case, with the exception of n = 100 parcellation, the SNR profiles for most tasks are roughly half the magnitude of the resting-state SNR.Furthermore, for all examined Schaefer parcellations, group-average task FCs appear to reach their SNR peak earlier than the resting-state counterpart.Further details are indicated in figure 4 -Panel A.
In the second scenario when the minimum number of time points is used across all

The SNR-driven inequality
It is important to check if SNRs are robust against randomness, i.e., whether they are a valid factor in deciding the threshold.To do so, we randomly shuffle Yeo's resting state networks and recompute the SNR response.We repeat the random shuffling procedure 100 times and record the results for all nine group-average FC induced by the nine Schaefer parcellations, each of which is under the REST 1 condition with scanning pattern LR.Results for RL pattern are available in Supplementary Information.
For every fixed Schaefer parcellation granularity, the null model SNR profiles are uniformly lower than those of all subjects across the entire thresholding range.Furthermore, the null model values do not exceed the hard threshold imposed by weak recovery criteria, i.e.SN R = 1.This observation holds true across all investigated Schaefer parcellations, as seen in panel F of figure 3. Interestingly, the SNR gets uniformly smaller as Schaefer parcellation granularity increases, as seen also in Panel F.
Collectively, given the SNR results obtained at rest, under task conditions, and null models, we empirically form an inequality relation between resting state and task fMRI-induced FCs in terms of SNR response and the corresponding level of prominence of Yeo's resting state networks across different fMRI conditions: This general order of SNR response is observed at the threshold τ that maximizes the objective function SN R by the weak-recovery criteria.At such optimal threshold values, all SNR profiles for task fMRI are in the weak recoverability region while still smaller in magnitude than that at the resting state.Together, these inequalities constitute an empirical lower-bound and upper-bound for SN R task , at least for all the tasks investigated in our study.

Maximum SNR and threshold relationship
As the granularity of Schaefer parcellation gets finer, the corresponding group-average SNR profiles get larger due to the natural scaling of the community profile matrix P Q.This observation applies to the majority of the threshold range.Moreover, per figure 3 Panel F, we see that optimal thresholds, e.g.τ opt , tend to decrease as the granularity level increases, which suggests that larger Schaefer FCs do not need to be thresholded as much.Another interesting observation is that with the exceptions of n = {100, 200, 900}, all other investigated granularity levels accept a very stable optimal threshold τ GA opt = 0.25.Being a computation pipeline that relies on discretized line search on threshold τ (of increments 0.05 for τ = [0, 1]), yielding this level of consistence of optimal value is unexpected.

Validation: Highly-putative partition back-test
The theory of weak recovery and its extended usage proposed here allowed us to argue for the relevance of using SNR as a measure that, via thresholding as a specific use case, guides the estimation of well-defined functional connectivity given the mapping of an a priori set of FNs.In this section, our goals are two-fold: (i) validate our framework by solving the "forward" problem (e.g., community detection) and compare the detected partition with the ground truth; (ii) benchmark different community detection algorithms to solidify the above validation.
Specifically, we juxtapose SNR as a guiding measure against objective-function community detection methods.One such method is Newman's Q-score maximization [42,43,44].Here, instead of using Q-score as an objective function for community detection, we use it as a guiding measure to investigate its behavior across the tested threshold range [0, 1].In broad strokes, the Q score, modularity score, measures the statistical difference between a network and its corresponding null model with similar topological properties such as the degree sequence.It can be computed as follows: where δ(•, •) and α are the Kronecker delta and tuning parameter (defaulted at α = 1), respectively.In network neuroscience, the majority of studies examining mesoscopic structures of brain functions heavily leverage the maximization of Q score, which unravels predominantly assortative communities, i.e., mesoscopic structures with denser internal edge density than the external one [50,51].SBM inference methods like Weighted SBM Inference, in principle, uncover a more diverse type of communities beyond assortative ones, such as dis-assortative and core-periphery communities [15].Because of such distinct differences in principle between the two types of approaches, Q score would provide a good benchmark for comparing the robustness of SNR against various community detection approaches.Note that for Weighted SBM inference, we assume a Poisson distribution for the weighted graph [34].Although other model assumptions are possible, our goal in this paper is not to select the most fitting model assumption but rather to investigate the differences in the communities detected using two theoretically different approaches.In other words, we are not looking to see if Q score or SN R picks up the exact threshold where the inferred partition is information-theoretically aligned with Yeo's FNs; rather, we are interested in seeing whether each of those two measures captures the threshold interval where the two partitions agree to a relatively high degree.To measure the information-theoretic agreement between the inferred and ground-truth partitions, we use adjusted mutual information (AMI), which is a measure adjusted to chance.Further details on the inference method and AMI are described in the Supplementary Information.
Here, our evaluation criteria are as follows: there would only be a threshold subinterval that are better aligned with the highly-putative (ground-truth) community assignment.In the forward direction, we apply 3 different community detection algorithms across the investigated threshold range [0, 1) and measure the corresponding AMI between the inferred and ground-truth partitions.A robust guiding measure, since "blinded" from the ground-truth partition, should have a similar behavior, compared to the AMI curve.
Firstly, per figure 7 right panels, we see that both community detection methods, namely Newman's Q-score Maximization and Weighted SBM Inference, yield very similar trends.Specifically, both AMI profiles go up and down crossing the threshold range.Further, AMI gets smaller as n gets larger, which is expected for graphs with increasing numbers of nodes.Interestingly, the threshold values that maximize the AMI for Newman's Q-score Maximization tend to shift left as n increases.We see this particular behavior with SN R in the earlier result section (Panel F of figure 3).Secondly, the Q score keeps a fairly steady rise in magnitude across the threshold range.Further, it does not appear that the Q score is parcellation dependent; this is expected because the measure is normalized by 2m.Moreover, Q score peaks and plateaus at a very high threshold range τ ∈ [0.6, 0.8].In that range, the thresholded FC is highly fragmented (Figure S1) with extremely low edge density 3 and ceases to retain interesting topological insights for further analysis.Lastly, we see that SN R driven curves, with an a priori set of FNs, behave very similar to AMI profiles of both objective-function approaches.On the other hand, the Q score keeps rising across the threshold range and starts plateauing towards the end, which indicates failure at picking a threshold useful for an a priori partition such as Yeo's FNs.Collectively, our results show, once again, that SN R computation on weighted, thresholded FCs provides excellent guidance for reconstructing a graph with the most information-theoretic relevance to a particular fixed set of FNs.

Discussion
In recent years, the network neuroscience field has been striving forward with many exciting discoveries that are becoming more and more relevant to clinical applications and personal medicine.In network neuroscience, this urges the need to improve a popular proxy of brain function, namely functional connectivity.Having the most proper, state-of-the-art mathematical representation of functional brain circuits allows for more accurate and confident positioning of research endeavors.In this work, we put forth a simple framework that allows improving the mathematical representation of brain functions given the use of an a priori set of functional networks.This framework also doubles as a clear evaluation tool for any specific combination of FC parcellation, FN partition, and edge pruning techniques applied to a large-scale brain dataset, thereby streamlining the complex yet crucial studies in network neuroscience.
Thresholding, which is an edge pruning technique used in post-FC processing, is seldom challenged as a standard practice that eliminates, albeit arbitrarily, statistically spurious edges.Since an increasing body of clinical research now involves FC thresholding in the data construction pipeline, careful scrutiny of thresholding is therefore imperative.We conclude that there is no single constant threshold value that is optimal across different parcellation granularity levels, such as the Schaefer ones.In particular, from coarser to finer Schaefer granularities, the optimal threshold value decreases.This result is partially observable in the behavior of matrix W across all studied Schaefer parcellations.According to figure 7, we see that for a fixed threshold value, as Schaefer granularity increases, Yeo's functional networks behave more in an assortative manner, i.e., denser internal edge density and sparser external one.We see that through a brighter diagonal and a darker off-diagonal regime of matrix W , across Schaefer parcellations with fixed threshold value τ .Information-theoretically, it means that a larger graph (in size) tends to contain more relevant information to unravel the ground-truth partition (in our study, seven Yeo's resting state networks); hence, we do not need to threshold the FCs as deeply as the lower granularity parcellations such as n = 100.This result also suggests that FC size is proportional to the level of prominence, or fitness, of the a priori set of FNs.Nonetheless, the exact limit of this behavior when granularity tends toward infinity is unknown, e.g., whether the optimal thresholding value will reach a plateau even if the granularity increases.Despite this uncertainty, we showed that for the majority of granularity levels between 100 and 900, the commonly used threshold of τ = 0.25 is a valid choice for group-average topology.
Moreover, when using SNR as a goodness-of-fit measure while fitting an a priori set of FNs onto the FC, while no significant differences are observed between resting state and task conditions for the low-resolution parcellation (n=100), distinct differences emerge at higher resolutions.There are two ways of interpreting this result: i) a priori FNs exhibit a poorer fit during rest compared to task states; ii) there is an intrinsic shift of functional network dynamics at the individual level between the resting state and the task condition.Furthermore, there is also strong evidence suggesting a wide variance in the individualized thresholds across all Schaefer parcellation granularity levels.In the same vein, our results also support the concept of individualized parcellation suggested by the work of Salehi and colleagues [48].While intuitive and insightful, individualized parcellation across subjects and tasks remains computationally expensive.To that end, our work offers a well-defined tool to examine the level of relevance a particular set of functional networks exhibits when mapped onto individual FCs under different conditions.In simpler terms, it allows us to, for the first time, quantify the individual differences (through information-theoretic gap) when the same atlas is mapped across cohort and/or task domains.This paves the way for alternative frameworks that build upon our work, potentially leading to task-dependent or subject-dependent parcellation methods beyond that proposed in [48].
Our work also extends the usage of the weak-recovery theorem by leveraging SNR as a goodness-of-fit measure.Specifically, our results suggest that for the majority of threshold values, the masked binarized FCs are in the regime of week recovery.However, an open question remains: when parcellated by the Schaefer atlas for a fixed individual and an fMRI condition, is the sequence of FCs in the exact recovery regime?Future studies are needed to address this information-theoretic gap between weak and exact recoverability requirements that is reflected by two measures: SNR for weak recovery and Chernoff-Hellinger distance for exact recovery.Although exact recovery is a stronger requirement, if the Schaefer graph sequence falls within the exact recovery degree regime, the mutual information between the inferred partition (through network inference and objective-based community detection methods) and the ground-truth one (e.g., Yeo's parcellations) will be theoretically higher.Furthermore, future work should address limitations in the fMRI voxel resolution, both spatial and temporal, and those in the corresponding Schaefer parcellations, particularly their impact on the SNR when fitting Yeo's functional networks.Specifically, further investigation should be done on the effect of voxel sizes (e.g., 2 mm isotropic for the HCP data set [58]) and the repetition time (e.g., 720 ms for the HCP data set [58]).
Lastly, our findings highlight two important points for brain connectomics research.First, because of the existence of individual brain fingerprints [11,24], we need to pay extra attention when applying a common, fixed atlas to individual FCs.Secondly, we show that thresholding FC matrices is not only an intuitive step during FC postprocessing (e.g., to eliminate statistically spurious edges) but also a necessary one if we would like to use such FCs, coupled with an a priori set of FNs, to support any research endeavor in brain connectomics.These results suggest a promising new direction: individualized and task-dependent parcellation methods as an alternative to fixed atlases like Yeo's.This opens up new directions for precision medicine, particularly targeting individual-specific neurological and neuropsychiatric disorders.
The purpose of this document is to elaborate on the machinery of the morphospace and other aspects such as the dataset and brain atlas used to analyze the data.The aim is to provide further analytic results in conjunction with those already presented in the main paper.

Number of Connected Components
In this section, we investigate the topological features of the Schaefer FC graph sequence across the entire threshold period τ ∈ [0, 1].Specifically, we examine the number of components across the threshold range and Schaefer granularity levels.
We use all nine available Schaefer parcellations with n = [100, 200, ..., 900] and their corresponding mappings of Yeo's seven resting state networks [17] for each granularity arXiv:2406.18531v2[q-bio.NC] 23 Jul 2024  To study this characteristic, we use resting state fMRI data, e.g., rfMRI.Without loss of generality, we select the first resting scan, i.e., REST 1 , with phase encoding LR.It is important to note that the connectivity (computed as the number of connected components) of the thresholded FC (where the absolute values of functional edges are set to zero -only applying step (a) above) is analogous to its binarized thresholded counterpart (where the surviving functional edges are set to one -applying both step (a) and (b) for any given threshold and Schaefer parcellation choice).The number of components is computed using the Python package networkx after converting the FC matrix to a graph object.
Firstly, for all considered Schaefer parcellations, we observe that the group-average FC fragments (e.g., splits into more than one connected component) earlier than the individual subjects' FCs.This is because the normalization of functional edges across the cohort domain neutralizes individual differences and zeroes out relatively faster across the thresholding range (Panel A).Moreover, it is also expected that the groupaverage number of connected components increases proportionally with the parcellation sizes, and for a fixed threshold value, the number of connected components in a coarser parcellation is always smaller than in a finer one (Panel B).
We also compute the differential change ∆C (in percentage) between two consecutive Cs across the threshold range as follows: where l is indexed over the threshold range.We observe that both the groupaverage (Panel C-top) and individual level (Panel C-bottom) show an empirical phase transition in the number of connected components.This phase transition occurs in the sub-intervals (0.05, 0.30) and (0.20, 0.45) for the group-average and individual levels, respectively.Although there is a numerical overlap between the two phases, the groupaverage transitions earlier than the individual one.

FN-Differential Identifiability I dif f and Empirical Schaefer Degree Regime
In this section, we investigate the behavior of the matrix W bin of group-average FCs, using Yeo's 7 resting state networks [17].To make some empirical observations about the Schaefer FC sequence degree regime, we examine the group-average masked FC, M GA , across all nine granularity levels and the threshold interval τ ∈ [0, 1] with an increment of 0.05.The reason we look only at the masked (binarized) FCs is because • The Sandon et al. [1] theorem on weak-recovery is written for binary graphs.Hence, the recoverability requirement on the degree-regime is only applied to the binary scaffold.
• We see that looking at weighted graphs is not appropriate in this case as the row (or column) sum of the FC matrix would yield the connectivity strength of a node, not its degree.
Here, we investigate the empirical degree regime of the Schaefer graph sequence based on the behavior of W bin .For all studied Schaefer granularity levels and threshold combinations, to infer W bin , we simply use the maximum likelihood rule as mentioned in the main text.Recall that matrix W bin = [w ij ], where w ij contains the probability that  a node u in community i is connected (e.g., a uv = 1) or not-connected (e.g., a uv = 0) to another node v in community j.Its entries are bounded between 0 and 1.Also, recall that in the previous section on the degree regime, the graph sequence is in a constant degree regime if the corresponding matrix W does not scale with n, e.g., s t = 1.
Here, we look at the behavior of the degree regime through a proposed measure, called FN-differential identifiability, inspired by Amico et al. [3], as follows: where i, j ∈ [k] and k = 7 in our study.Moreover, ⟨W ii ⟩ and ⟨W ij ⟩ are the averages of the diagonal and off-diagonal entries of matrix W , respectively.We formally define ⟨W ii ⟩ and ⟨W ij ⟩ to be the differential identifiability within (e.g., I F N s self ) and between (e.g., I F N s others ) FNs. Per Figure S2, for most threshold values (with the exception of τ = 1), there is an intensity shift in W from between-FN connectivity to within-FN connectivity strength as I F N s dif f increases across Schaefer parcellation granularity levels.In other words, within-FN identifiability I F N s self increases as between-FN identifiability I F N s others decreases.Moreover, the monotonic increase of I dif f also suggests that finer-grain Schaefer parcellations might reflect a higher level of information-theoretic prominence of the a priori set of FNs, given the group-average FCs.These results indicate that the Schaefer brain graph sequence resides in the diverging degree regime, with the exception of the trivial case τ = 1.Empirically, these results suggest that the Schaefer graph sequence is at least not in the constant degree regime, i.e., s t ̸ = 1.The null model is assessed by feeding a randomized partition that respects Yeo's FNs sizes.The number of simulations is 100, and the scanning session is LR.Results on the empirical distribution of randomized SNR scores are shown in Figure S3.

Notations
In this section, we describe some stochastic block model (SBM) fundamentals, along with some fundamental mathematical notions that are not included in the main text.For instance, a number (scalar) is denoted with a regular letter such as x, y; a default vector is denoted by a bold regular letter i.e., x, sometimes with or without a subscripted index, and is in column format.Matrices are denoted by bold, capitalized letters, i.e., A, while a set is denoted by a capitalized letter, i.e., S. Further, N and R are sets of natural and real numbers, respectively; [l] denotes all positive integers between 1 up to l.All other standard mathematical notations are assumed unless otherwise specified.

SBM Definition
The Stochastic Block Model (SBM) has a history of both depth and breadth, spanning across multiple disciplines.Here, we only extract relevant information regarding SBM literature relevant to our exploration.Stochastic Block Models (SBMs) are random graph models that generate ensembles with clusters.Specifically, they generate Erdos-Renyi (ER) subgraphs union with multi-partite graphs between those subgraphs.A traditional SBM generates binary graphs, i.e., networks with {0, 1} edges.Nonetheless, there are SBM models developed for weighted networks, which are called weighted SBM (or WSBM).

SBM Inference and Synthesis
(Binary) SBM.Since SBM is a generative model, it is essential to discuss how to synthesize ensembles using such models, e.g., network synthesis, and how to infer SBM parameters using the observable ensembles, e.g., network inference.In the context of our problem, we have a slightly different starting point as the partition is not latent, though generally, partitions are often inferred.Networks with an existing ground-truth partition are very rare; furthermore, those ground-truths cannot be defined in an absolute sense.The majority of SBMs are defined as follows: However, in the context of our paper, the partition is not latent.Specifically, (G, σ, k) ∼ SBM (p, W ) for our application.
In the case of G ∼ SBM (k, p, W, σ), SBM seeks a partition that divides network G into k communities.The probability that two nodes are connected to each other is governed by the probability W σu,σv .To fit SBM onto a network, one needs to estimate W = [w ij ], ∀i, j ∈ [k] (meaning that k is an a priori condition for fitting) along with the community label σ u , ∀u ∈ [n].Assuming that each edge is drawn independently from identical distributions, the probability that a network G = A = [a uv ] is generated (synthesized) from a priori W and σ (prior beliefs) is as follows: for symmetric networks.From the inference standpoint, the Bayesian posterior probability can be computed as follows: where P(W, σ) represents Bayesian prior beliefs.If there is only one W (hard constraint, Piexoto) that is comparable to network A and partition σ, then we can drop the summation notion, resulting in: The hard constraint assumption is a very standard technique to isolate the eventual partition σ for inference purposes.Note that the adjacency structure A is, of course, "hard" (there is only one ensemble A).Since P(A) is also fixed, maximization of posterior probability which is also understood as the minimization of the description length (DL, measured in bits) of ensemble A using partition σ.Once again, the hard constraint assumption yields that the description length ultimately only depends on: In the binary SBM case, it follows that: Hence, minimization of DL is equivalent to maximizing the log-likelihood function.
Weighted SBMs.The assumption of binary edges could be unfitting for some applications, including functional brain networks where there is a need to express different levels of functional coupling strength numerically.In such cases, we need to introduce the structure of covariates (denoted as x = [x σu,σv ]) to model the weights.In this case, the prior is written as follows: It follows that the posterior probability becomes: Using a similar technique in the binary case, one can estimate the covariate structure first so that joint probabilities with x, e.g., P(x, σ) and P(A, x), do not alter the posterior distribution behavior.Hence, the posterior belief is proportional to the priors, which can be written as follows: Ultimately, the task of finding the "ground-truth" partition σ depends on the likelihood of prior beliefs.This is equivalent to maximizing P(A | x, σ).The first task is, of course, to estimate x as A is already available.We notate the covariate structure x to be more integrated with probability distribution parameter notations P(X = x).Specifically, we assume that the realized FC edge weights are drawn from some distributions with specific parameter(s).Model Selection.There are different approaches to model selection (e.g., which edge weight distribution one should use, given the empirical data).In the context of this paper, given the empirical distribution of FC edge weight and the usage of absolute functional connectivity (non-negative pair-wise edges), we shortlist two candidate distributions: exponential (continuous) and Poisson (discrete counterpart).Each choice has its pros and cons.For instance, choosing the exponential distribution allows us to stick with continuous ensembles of functional edge weights, which is consistent with how FC edges are computed using Pearson correlations.However, in a continuous distribution, the probability of an FC edge taking on a particular value is zero by definition.Yet, the functional connectome is sparse [5], e.g., the majority of pairwise interactions between two brain regions are non-existent.Hence, using the exponential distribution will not suffice.A common approach is to use a different distribution such as the Binomial to model edge (non-)existence and connect the two distributions using a weighted average (see the methods section in [5] for further details).This will force the modeler to make a precursor assumption on weight value, which is not ideal.
On the other hand, using the Poisson distribution (a discrete counterpart of the exponential distribution) offers us the distinct advantage of modeling a non-zero probability of getting zero-valued functional edges.Recall that the Poisson probability density function is as follows: Clearly, P(X = 0) = e −λ > 0, which ultimately depends on λ inference based on empirical observations of functional edges.Nonetheless, the shortcoming of using a discrete distribution is precisely the advantage of using the exponential one: being able to model edges in a continuous manner.To overcome this shortcoming of discrete distribution usage, we convert functional couplings (computed by Pearson correlations, which are nicely bounded between [−1, 1]) to percentage points and round to the nearest integer.For instance, if a functional edge has a value of a uv = 0.588, the weighted graph will take a uv = 59.The reason for rounding to the nearest integer is that the Poisson distribution takes on non-negative integer values N + .Note that using the Poisson Distribution makes no essential topological changes to the original FC other than rounding functional couplings to the nearest integer.
In this paper, we use the Poisson distribution for degree sequence as proposed by Karrer and Newman [8].In the next sections, we review the inference procedure (as proposed in [8]) for both assumptions: • Non-degree-corrected WSBM; • Degree-Corrected WSBM.
WSBM Inference Procedure: In this paper, we use the method described at https://graph-tool.skewed.de/by Tiago Peixoto.Further treatments on weighted SBM can be found in [10].After reviewing the inference approaches, we compare the philosophical similarities and differences between WSBM inference and Q score modularity.
Standard (Non-degree-corrected) WSBM The review of non-degree-corrected (NDC) WSBM is provided in a well-cited paper by Karrer and Newman [8].The authors assumed such a distribution for multi-graph ensembles, where edges can take on integer values larger than 1.In this case, the prior probability can be written as follows: It is important to note that the expected adjacency structure in this case is where Y ∈ [0, 1] n×k is the node community membership matrix, i.e., y ul = 1 if and only if node u is in community l ∈ [k].Note that self-loop edge weight cannot be counted twice.For symmetric networks where A uv = A vu and x ij = x ji , the above prior probability can be written as follows: where |Ω i | is the cardinality of community i, C ij is the counted number of edges between community i and j which can be simply computed by: where δ is the Kronecker delta function as defined in the main text.Similar to the binary case, the log-function is then be: where Θ(G) is the quantity dependent on ensemble G (such as |Ω i | or A uv ) which has no impact onto the logarithmic function behavior (i.e.not impacting the optimal value of this function).The inference process reduces to maximizing: Note that here, we drop A (ensemble adjacency structure) just to ease notation usage and emphasize which variable(s) the likelihood function depends on.To optimize the above function, one can use differential calculus as follows: Setting the first derivative to zero, e.g.L ′ = 0, we obtain: Note that now we have estimated x, i.e., the covariate structure, the likelihood function can be written as follows: Dropping the constant 2m (ensemble node's degree sum) and substituting the estimated covariate x, the log-likelihood function can now be written as: Using simple algebra, the log-likelihood function can be rewritten as: where, again, Θ is a constant function based on ensemble G.
Let Y and Z be the random variables representing community assignment on one end of a stub (half-edge).Then we can build a joint probability distribution between Y ′ and Z ′ as follows: On the other hand, the randomized counterpart distribution of these random variables (with the same a priori partition σ) is In this case, edge formation (from two stubs) is completely random with probability |Ω i | n and n for each stub.Overall, the likelihood function becomes: On the other hand, the Kullback-Leibler Divergence between two probability distributions P (x) and Q(x) is defined to be: where x ∈ X is the random variable taking on values in the sample space X .Then the log-likelihood function above can be thought of as an information theoretic measurement between the "ground-truth" probability distribution x(σ) and the corresponding null distribution x(null).If we only look at what happens within communities, the quality function becomes: If we substitute the estimated P σ and P null above into this equation, we obtain: Of course, we also have the log-function describing the differential information description requirement between communities: We will compare the within-community L within with the modularity function Q score in the subsequent sections.
where s i = u|σu=i d u is the number of half-edges in community i.Note that there are |Ω i | terms of θ u for each community.We see that L 1 is the entropy of the probability distribution representing the random variable θ, i.e., the probability that an edge in community i lands on u for which σ u = i, ∀i.This entropy is minimized when θu = d u u d u Here, it is important to note that if we choose a random uniform distribution for the random variable θ (e.g., θu = 1 |Ω i | ), we obtain minimized L 1 , which reduces L.

Difference between Non-degree-corrected and Degree-corrected Models
Plugging in the estimated parameters for both cases of WSBM, we obtain: which is the Kullback-Leibler divergence between P (x) (same as in the NDC case) and Q DC (x).In other words, for the DC case.Recall that for the NDC case, the null model is: Thus, the best fit to the NDC WSBM is the partition that most surprises the Erdos-Reyni random counterpart while for the DC WSBM case, it is the group assignment that is most surprising to the random model with the same empirical degree sequence.and u,v∈i where s 1 is the total number of half-edges (stubs) that originate from nodes in community i.
We also look at the null model from another perspective: the event of a stub (half-edge) exists at node u with probability P u (stub) = du 2m , likewise, at node v with P v (stub) = du 2m .These two independent events need to happen sequentially to form an edge between node u and v with probability Note that here, no community indication is available for either node u or v, which implies a null model (i.e., random partition) of σ (ground-truth).
The Q-score can be applied to both binarized or weighted graphs.In this case, for each threshold value, the Q-score is computed for the weighted group-average FCs across Schaefer granularity levels.Maximizing modularity has been shown to unravel assortative communities, while SBM has been shown to uncover different types of communities, beyond assortative ones [5].

Philosophical Similarity between Log-likelihood Function and Q-score
In this section, we compare the NDC, DC WSBM, and Q-score approach by first revisiting their formulas: (i) The NDC log-likelihood function (within communities): (ii) The DC log-likelihood function (within communities): (iii) Q-score modularity function, using community block format: It is very interesting (yet, not surprising) that the two most well-known methods for community detection are based on a similar principle of comparing a structureless counterpart (that has some similar topological characteristics) with the network at hand (which is hypothesized to have some latent structure of communities).In both approaches, the hypothesized distribution of random variables Y ′ and Z ′ for withincommunity is actually the same: Obviously, there is a difference between the null model choice between the Q score and NDC WSBM approach as the latter does not feature the observed network degree sequence while Newman's Q method actually does.This shortcoming is resolved with the DC WSBM approach as mentioned in the previous section.In fact, the DC WSBM and Q score null model is actually the same.Specifically, for DC WSBM, the null model is while for the modularity approach, it is also: What, then, is the shortcoming of the Q score approach?It does not emphasize what happens with the "between" community dynamics.To be clear, one can rewrite the Q score so that it reflects between-community edges as proposed by Fortunato [6] as follows: where Cut = m − 1 2 i C ii being the number of inter-community edges and E(Cut) is its corresponding expected counterpart.Basically, modularity would like to maximize the within-community edges (equivalently, minimize the between-community edges).Hence, it biases towards assortative community assignments.
On the other hand, the log function of WSBM has also incorporated what happens between communities through L between .This is why SBM inference method shines over traditional Q maximization techniques for its ability to uncover a more diverse range of community structures beyond assortative ones.

Statistics to compare Q max and SBM-inference community detection approaches
Given the node set of N elements S = {s i | i ∈ [N ]}, to quantify the amount of shared information between two partitions (e.g., a common approach would be using mutual information score, which can be quantified as follows: where R and C are the number of clusters in partition vectors σ 1 and σ 2 , respectively; P σ 1 σ 2 and P σ 1 are the joint and marginal probability distributions, respectively, between two discrete random variables representing two realized partitions.A typical initialization step is to build a contingency table which indicates the number of common nodes has in common between cluster σ 1 r and σ 2 c :  where n rc represents the number of common entities between cluster σ 1 r and σ 2 c ; the row and column marginal sums are denoted as ⃗ a = a r and ⃗ b = b c , respectively.By construction, r a r = c b c = N .The expected mutual information for a random partition with the same contingency table has a closed-form formula as proposed in [16]: The entropy associated with the two partitions are: N is the probability that an element picked randomly from set S belongs to σ 1 r .Analogously, Finally, putting all components together, adjusted (for chance) normalized mutual information (AMI) can be computed as follows: Note that there are other ways to average the independent entropy of the two partitions such as arithmetic.In this paper, we use the maximum between the two entropy quantities.

Graph Sequence and Required Topological Features for Recovery
A graph sequence is a mathematical series of graph ensembles generated by some fixed rules, denoted by {G l } where l is the sequence index.For example, one can generate an ER random graph sequence denoted as G t (t, p) with fixed p and its limiting graph denoted as G t=∞ (t, p), also known as a graphon.In recent developments in SBM (k, p, W ) theory, it is important to note the theoretical scaling characteristics of model parameters such as k, p, and W , and make necessary assumptions, i.e., which scales with t, and which stays constant as the graph size grows to ∞.
Firstly, it is common to assume that p and k do not scale with t; hence, the number of communities and their respective sizes do not grow with t [1].In other words, communities are assumed to have linear sizes [1].Moreover, matrix W has theoretical ties with an important topological characteristic of a graph sequence, such as the degree regime.
The importance of the degree regime lies in its relations with graph connectivity.There are two important degree regimes relevant for graph partition recoverability: • Constant Degree Regime: In this regime, the connectivity pattern is fixed (independent of Schaefer granularity levels).Asymptotically, node degrees do not scale with graph size, i.e., W = O(n −1 ).In random graph theory, this is the degree where an ER graph is expected to have a giant component.This regime satisfies the minimum requirement for the weak-recovery criteria, which will be formally defined later.
• Diverging Degree Regime: In this regime, the connectivity pattern varies with graph sequence sizes.Asymptotically, node degrees scale with graph size at a scalable factor s t , i.e., W = O(log(n)n −1 ).In random graph theory, this degree regime generates a connected ER ensemble, in expectation.This regime satisfies the minimum requirement for exact recovery, which will be defined in a later section.

Neuroimaging Data Acquisitions
The fMRI dataset used in this paper is available in the Human Connectome Project (HCP) repository (http://www.humanconnectome.org/),Released Q3.The processed functional connectomes obtained from this data and used for the current study are available from the corresponding author upon reasonable request.Please refer to the detailed descriptions below on the dataset and data processing.We first describe the acquisitions of raw neuroimaging data from 409 Unrelated Subjects chosen from the list of 1200 participants by Essen et al. [14,15] in the Human Connectome Project (HCP) release.This subset of participants ensures that no two participants have any family relations, sharing parents or being siblings.This selection is particularly critical to avoid any confounding effects in our subsequent analyses, such as group average analysis, due to family structures.
Per HCP protocol, all subjects gave written informed consent to the HCP consortium.The two resting-state functional MRI acquisitions (HCP filenames: rfMRI REST 1 and rfMRI REST 2 were acquired in separate sessions on two different days, with two distinct scanning patterns (left to right and right to left) in each day, [7], [15], and [14] for details.This release also includes data from seven different fMRI tasks: gambling (tfMRI GAMBLING), relational reasoning (tfMRI RELATIONAL), social (tfMRI SOCIAL), working memory (tfMRI WM), motor (tfMRI MOTOR), language (tfMRI LANGUAGE, including both a story-listening and arithmetic task), and emotion (tfMRI EMOTION).Per [7], [4], three tasks MRIs are obtained: working memory, motor, and gambling.The local Institutional Review Board at Washington University in St. Louis approved all the protocols used during the data acquisition process.Please refer to [4,7,13] for further details on the HCP dataset.

Constructing Functional Connectomes
We used the standard HCP functional preprocessing pipeline, which includes artifact removal, motion correction, and registration to standard space, as described in [7,13] for this dataset.For the resting-state fMRI data, we also added the following steps: global gray matter signal regression; a bandpass first-order Butterworth filter in both directions; z-scores of voxel time courses with outlier eliminations beyond three standard deviations from the first moment [9,11].
For task fMRI data, the aforementioned steps are applied, with a relaxation for the bandpass filter [0.001 Hz, 0.25 Hz].Starting from each pair of nodal time courses, Pearson correlation is used to fill out the functional connectomes for all subjects at rest and seven designated tasks.This would yield symmetrical connectivity matrices for all fMRI sessions.

Brain Atlases
The brain atlases used in this work are sequential, in the sense that their granularity increases, ranging from 100 nodes to 900 nodes (increment of 100 nodes each time), registered on the cortical surface of the brain.These sequential atlases are made possible thanks to the work of Schaefer and colleagues [12].Similarly to references [2,3], 14 sub-cortical regions were added, as provided by the HCP release (filename Atlas ROI2.nii.gz).We accomplished this by converting this file from NIFTI to CIFTI format using the HCP Workbench software [(http://www.humanconnectome.org/software/connectomeworkbench.html), with the command -cifti-create-label.The resultant sizes of ROI-based connectomes are, hence, 114, 214,..., 914 nodes for rest and any given fMRI tasks.Mathematically, we denote the Schaefer parcellation sequence to be G t ℓ where ℓ ∈ [9] and t ℓ = [114, 214, ..., 914].
Moreover, Schaefer parcellations are also coupled nicely with further subdivisions of Yeo's functional networks [17] so that the partition associated with a coarser Schaefer graph is related to that of a finer-grained Schaefer one.For a fixed Schaefer granularity (indexed t ℓ ), we denote the corresponding Yeo's resting-state networks to be σ t ℓ .
For instance, let u 114 be a node in the Schaefer graph with n = 114 nodes and a community label In fact, we can generalize this as follows: Definition 1.Let l, q be graph sequence indices and σ be the network partition.If (i) u n l = ∪u nq s.t.l < q, u n l ∈ V (G n l ), u nq ∈ V (G nq ); In practice, the subsequent divisions from coarser to finer granularity of Schaefer parcellations are not perfectly hierarchical in the sense that one node in the coarser parcellation does not perfectly parcellate into subsequently smaller ROIs in the finer one.Nonetheless, in this context, we can relax the condition (i) as follows: if the node associated with the coarser parcellation has the majority of spatial overlaps with the ones in subsequently finer parcellations of the Schaefer graph sequence, then they are assigned to the same resting state network.

2. 3 .•
SBM description, inference and extended usage 2.3.1.Model Description In this section, we define some of the key components of SBM.Other fundamental mathematical notations are referred to in the section Stochastic Block Model Preliminaries in Supplementary Information.• G = [a uv ] = F C, weighted-graphs M, binarized-graphs : network/graph (e.g.FCs in the context of this work); • V (G) = {u}, and E(G) = {uv | u, v ∈ V (G)} be set of vertices and edges, respectively; The size and order of a network are denoted by |V (G)| = n and |E(G)|, respectively; bin (b) Repeat this computation for all threshold values, apply Theorem 1 to determine the weak-recoverability sub-interval (a w , b w ) ⊊ τ = [0, 1] for the group-average FC, i.e.M GA τ (iii) Step 3: For a given individual FC and threshold value τ , compute the associated thresholded FC, i.e., F C γ τ , and then compute the Stochastic Block Model (SBM) parameters for F C γ τ .Extend the usage of SN R as a mesoscopic prominence measure: W ) n=100: no.of brainROIs; k = 7: number of FNs σ = [k] n = [7] 100 : a priori set of FNs; Ω = [|Ωl|]∀l ∈ [7];

ForFigure 2 :
Figure2: Example of the FC reconstruction routine based on the Schaefer granularity level of 100 nodes and resting-state fMRI with scanning pattern LR.Note that the for-loop indicated by ( * ) is used to find the individualized optimal threshold for each subject, τ γ opt .One can substitute this for-loop by finding the unique cohort optimal threshold, τ GA opt , using the groupaverage FC.

4. 1 .
Weak-recoverability sub-interval (a w , b w ) Based on panel (A) of figure 3, we see that for most Schaefer granularity levels (except for n = 100), the lower and upper bound of theoretically guaranteed sub-interval of

Figure 3 :
Figure 3: Panel (A) is the weak-recoverability sub-interval of τ ∈ (a w , b w ) ⊊ [0, 1] (Step 2).Panel (B) is the 5-and 95-percentile of individual subjects' SNR for four distinct Schaefer parcellations n = 100, 300, 600, 900.Panel (C) illustrates the SNR null models.Panel (D) is the FC density, on a logarithmic scale, across the same 4 granularity levels.Panel (E) shows SNR profiles computed on group-average FCs (again, over the same granularity levels).Finally, Panel (F) reports the optimal threshold τ opt computed based on the maximum SNR of group-average FCs.Note that in panels (D) and (E) the weak-recoverability sub-intervals use the maximum and minimum values for the upper and lower bound, respectively, across Schaefer parcellations.

Figure 4 :
Figure 4: fMRI task and rest SNR profiles.For each row of panels, the left one represents SNR behavior across threshold range τ ∈ [0, 1] using the maximum scanning length for all fMRI tasks and rest.The gray shade represents the 5-to 95-percentile of SNR task regime across all fMRI tasks and the entire threshold range.In the right panel, SNR profiles for taskand resting-state fMRI are computed using 166 time points corresponding to the scanning length of the shortest scanned task, i.e., EMOTION.Results are for 4 Schaefer parcellation levels: n = [100, 300, 600, 900].

Figure 6 :
Figure 6: Maximum SN R computed for Resting state of Schaefer Group-average FC with n = 300 for increasing scanning lengths, starting at 100 to 1000 time points, increments of 100 each.

Figure 7 :
Figure 7: Panel (A) -left figure represents the modularity score of a thresholded group-average FC across threshold range τ ∈ [0, 0.85].Panel (A) -right figure reports the normalized mutual information between the inferred partition (using Q-score maximization heuristics) and Yeo's FN partition.The same order goes to Panel (B).Panel (B) represents the results of the SNR approach.Note that the full threshold range is not necessary because in the sub-interval τ ∈ [0.9, 1.0], the thresholded graph is almost (if not) empty.The displayed result is for the group-average FCs, over four Schaefer granularity levels n = [100, 300, 600, 900].
level.Besides the individual level FC (denoted as F C γ ), the group-average FC (denoted as F C GA ) is computed using the entry-wise mean across the individual FCs (denoted as F C): F C GA = Γ γ=1 F C γ Γ where Γ denotes the number of subjects and γ ∈ [Γ].Specifically, for each Schaefer granularity and threshold combination, we compute the number of components for each individual and group-average FC.

Figure S1 :
Figure S1: Panel (A) represents the number of connected components for each Schaefer parcellation (from 100 to 900 nodes with an increment of 100 nodes each time), across the pre-defined thresholding range τ ∈ [0, 1].Panel (B) represents the overlap in the number of components of the group-average FC for each Schaefer parcellation.Panel (C) shows the differential change (in %) between two consecutive numbers of component statistics across τ for group-average FCs (top) and the mean of individual subject FCs (bottom).
(a) FN-Differential Identifiability I F N s dif f score for Language fMRI task (LR scanning pattern).(b) FN-Differential Identifiability I F N s dif f for resting state (LR scanning pattern).

Figure
Figure S2: FN-Differential Identifiability I dif f for resting state and one particular fMRI task across all threshold and Schaefer Granularity Level combinations.